WWW Spiders: an introduction

نویسنده

  • Massimiliano Zanin
چکیده

In recent years, the study of complex networks has received a lot of attention. Real systems, including information networks and relationships between persons and users, have gained importance in scientific publications, despite of an important drawback: the difficulty of retrieving and manage such great quantity of information. This paper wants to be an introduction to the construction of spiders and scrapers: specifically, how to program and deploy safely these kind of software applications. The aim is to show how software can be prepared to automatically surf the net and retrieve information for the user with high efficiency and safety.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Information retrieval on Internet using meta-search engines: A review

Introduction Though automatic information retrieval (IR) existed before World Wide Web (WWW), post-Internet era has made it indispensable. IR is sub field of computer science concerned with presenting relevant information, gathered from online information sources to users in response to search queries. Various types of IR tools have been created, solely to search information on Internet. Apart ...

متن کامل

Analysing Users WWW Search Behaviour

In a recent study [1], Internet users ranked search as their most important activity, awarding it a 9.1 on a 10-point scale. The next most important activity ranked only 6.3. Internet search engines are continually updating their indexes, and scaling up their parallel processors to keep up with the growth of the WWW. It is estimated that there are 800 million indexable pages in the WWW [2], and...

متن کامل

Lost in Hyperspace? Free Text Searches in the Web

The World Wide Web (WWW) [LCG92] is a distributed hypermedia system for information discovery, retrieval, and collaboration. The hypertext paradigm has proven its usefulness for browsing large, distributed document structures. The ease of use provided by this paradigm is one of the reasons for the great popularity which the World Wide Web has gained through the last months. However, as the amou...

متن کامل

Comparison of Three Vertical Search Spiders

T he Web has plenty of useful resources, but its dynamic, unstructured nature makes them difficult to locate. Search engines help, but the number of Web pages now exceeds two billion, making it difficult for generalpurpose engines to maintain comprehensive, up-todate search indexes. Moreover, as the Web grows ever larger, so does information overload in query results. A general-purpose search e...

متن کامل

Comparison of Three Vertical Search

T he Web has plenty of useful resources, but its dynamic, unstructured nature makes them difficult to locate. Search engines help, but the number of Web pages now exceeds two billion, making it difficult for generalpurpose engines to maintain comprehensive, up-todate search indexes. Moreover, as the Web grows ever larger, so does information overload in query results. A general-purpose search e...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/0710.5054  شماره 

صفحات  -

تاریخ انتشار 2007